17 research outputs found
Sequence-To-Sequence Neural Networks Inference on Embedded Processors Using Dynamic Beam Search
Sequence-to-sequence deep neural networks have become the state of the art for a variety of machine learning applications, ranging from neural machine translation (NMT) to speech recognition. Many mobile and Internet of Things (IoT) applications would benefit from the ability of performing sequence-to-sequence inference directly in embedded devices, thereby reducing the amount of raw data transmitted to the cloud, and obtaining benefits in terms of response latency, energy consumption and security. However, due to the high computational complexity of these models, specific optimization techniques are needed to achieve acceptable performance and energy consumption on single-core embedded processors. In this paper, we present a new optimization technique called dynamic beam search, in which the inference complexity is tuned to the difficulty of the processed input sequence at runtime. Results based on measurements on a real embedded device, and on three state-of-the-art deep learning models, show that our method is able to reduce the inference time and energy by up to 25% without loss of accuracy
Two-stage Human Activity Recognition on Microcontrollers with Decision Trees and CNNs
Human Activity Recognition (HAR) has become an increasingly popular task for embedded devices such as smartwatches. Most HAR systems for ultra-low power devices are based on classic Machine Learning (ML) models, whereas Deep Learning (DL), although reaching state-of-the-art accuracy, is less popular due to its high energy consumption, which poses a significant challenge for battery-operated and resource-constrained devices. In this work, we bridge the gap between on-device HAR and DL thanks to a hierarchical architecture composed of a decision tree (DT) and a one dimensional Convolutional Neural Network (ID CNN). The two classifiers operate in a cascaded fashion on two different sub-tasks: the DT classifies only the easiest activities, while the CNN deals with more complex ones. With experiments on a state-of-the-art dataset and targeting a single-core RISC-V MCU, we show that this approach allows to save up to 67.7% energy w.r.t. a 'stand-alone' DL architecture at iso-accuracy. Additionally, the two-stage system either introduces a negligible memory overhead (up to 200 B) or on the contrary, reduces the total memory occupation
Ambipolar suppression of superconductivity by ionic gating in optimally-doped BaFe2(As,P)2 ultrathin films
Superconductivity (SC) in the Ba-122 family of iron-based compounds can be
controlled by aliovalent or isovalent substitutions, applied external pressure,
and strain, the combined effects of which are sometimes studied within the same
sample. Most often, the result is limited to a shift of the SC dome to
different doping values. In a few cases, the maximum SC transition at optimal
doping can also be enhanced. In this work, we study the combination of charge
doping together with isovalent P substitution and strain by performing ionic
gating experiments on BaFe(AsP) ultrathin films. We
show that the polarization of the ionic gate induces modulations to the
normal-state transport properties that can be mainly ascribed to surface charge
doping. We demonstrate that ionic gating can only shift the system away from
the optimal conditions, as the SC transition temperature is suppressed by both
electron and hole doping. We also observe a broadening of the resistive
transition, which suggests that the SC order parameter is modulated
nonhomogeneously across the film thickness, in contrast with earlier reports on
charge-doped standard BCS superconductors and cuprates.Comment: 10 pages, 5 figure
Efficient Deep Learning Models for Privacy-preserving People Counting on Low-resolution Infrared Arrays
Ultra-low-resolution Infrared (IR) array sensors offer a low-cost,
energy-efficient, and privacy-preserving solution for people counting, with
applications such as occupancy monitoring. Previous work has shown that Deep
Learning (DL) can yield superior performance on this task. However, the
literature was missing an extensive comparative analysis of various efficient
DL architectures for IR array-based people counting, that considers not only
their accuracy, but also the cost of deploying them on memory- and
energy-constrained Internet of Things (IoT) edge nodes. In this work, we
address this need by comparing 6 different DL architectures on a novel dataset
composed of IR images collected from a commercial 8x8 array, which we made
openly available. With a wide architectural exploration of each model type, we
obtain a rich set of Pareto-optimal solutions, spanning cross-validated
balanced accuracy scores in the 55.70-82.70% range. When deployed on a
commercial Microcontroller (MCU) by STMicroelectronics, the STM32L4A6ZG, these
models occupy 0.41-9.28kB of memory, and require 1.10-7.74ms per inference,
while consuming 17.18-120.43 J of energy. Our models are significantly
more accurate than a previous deterministic method (up to +39.9%), while being
up to 3.53x faster and more energy efficient. Further, our models' accuracy is
comparable to state-of-the-art DL solutions on similar resolution sensors,
despite a much lower complexity. All our models enable continuous, real-time
inference on a MCU-based IoT node, with years of autonomous operation without
battery recharging.Comment: This article has been accepted for publication in IEEE Internet of
Things Journa
Phonon dispersion and lifetimes in MgB2
We measure phonon dispersion and linewidth in a single crystal of MgB_2 along
the Gamma-A, Gamma-M and A-L directions using inelastic X-Ray scattering. We
use Density Functional Theory to compute the effect of both electron-phonon
coupling and anharmonicity on the linewidth, obtaining excellent agreement with
experiment. Anomalous broadening of the E_2g phonon mode is found all along
Gamma-A. The dominant contribution to the linewidth is always the
electron-phonon coupling.Comment: 4 pages, 3 figure
Privacy-preserving Social Distance Monitoring on Microcontrollers with Low-Resolution Infrared Sensors and CNNs
Low-resolution infrared (IR) array sensors offer a
low-cost, low-power, and privacy-preserving alternative to optical
cameras and smartphones/wearables for social distance monitoring in indoor spaces, permitting the recognition of basic shapes,
without revealing the personal details of individuals. In this work,
we demonstrate that an accurate detection of social distance
violations can be achieved processing the raw output of a 8x8
IR array sensor with a small-sized Convolutional Neural Network
(CNN). Furthermore, the CNN can be executed directly on a
Microcontroller (MCU)-based sensor node.
With results on a newly collected open dataset, we show that
our best CNN achieves 86.3% balanced accuracy, significantly
outperforming the 61% achieved by a state-of-the-art deterministic algorithm. Changing the architectural parameters of the
CNN, we obtain a rich Pareto set of models, spanning 70.5-86.3%
accuracy and 0.18-75k parameters. Deployed on a STM32L476RG
MCU, these models have a latency of 0.73-5.33ms, with an energy
consumption per inference of 9.38-68.57µJ
Sequence-To-Sequence Neural Networks Inference on Embedded Processors Using Dynamic Beam Search
Sequence-to-sequence deep neural networks have become the state of the art for a variety of machine learning applications, ranging from neural machine translation (NMT) to speech recognition. Many mobile and Internet of Things (IoT) applications would benefit from the ability of performing sequence-to-sequence inference directly in embedded devices, thereby reducing the amount of raw data transmitted to the cloud, and obtaining benefits in terms of response latency, energy consumption and security. However, due to the high computational complexity of these models, specific optimization techniques are needed to achieve acceptable performance and energy consumption on single-core embedded processors. In this paper, we present a new optimization technique called dynamic beam search, in which the inference complexity is tuned to the difficulty of the processed input sequence at runtime. Results based on measurements on a real embedded device, and on three state-of-the-art deep learning models, show that our method is able to reduce the inference time and energy by up to 25% without loss of accuracy
Dynamic Decision Tree Ensembles for Energy-Efficient Inference on IoT Edge Nodes
With the increasing popularity of Internet of Things (IoT) devices, there is a growing need for energy-efficient Machine Learning (ML) models that can run on constrained edge nodes. Decision tree ensembles, such as Random Forests (RFs) and Gradient Boosting (GBTs), are particularly suited for this task, given their relatively low complexity compared to other alternatives. However, their inference time and energy costs are still significant for edge hardware. Given that said costs grow linearly with the ensemble size, this paper proposes the use of dynamic ensembles, that adjust the number of executed trees based both on a latency/energy target and on the complexity of the processed input, to trade-off computational cost and accuracy. We focus on deploying these algorithms on multi-core low-power IoT devices, designing a tool that automatically converts a Python ensemble into optimized C code, and exploring several optimizations that account for the available parallelism and memory hierarchy. We extensively benchmark both static and dynamic RFs and GBTs on three state-of-the-art IoT-relevant datasets, using an 8-core ultra-lowpower System-on-Chip (SoC), GAP8, as the target platform. Thanks to the proposed early-stopping mechanisms, we achieve an energy reduction of up to 37.9% with respect to static GBTs (8.82 uJ vs 14.20 uJ per inference) and 41.7% with respect to static RFs (2.86 uJ vs 4.90 uJ per inference), without losing accuracy compared to the static model
Advanced surface characterization of Ba(Fe_(0.92)Co_(0.08))_2As_2 epitaxial thin films
We report on the systematic characterization of Ba(Fe_(0.92)Co_(0.08))_2As_2 epitaxial thin films on CaF2 substrate in view of their possible use for superconducting electronic applications. By using different and complementary techniques we studied the morphological characteristics of the surface, the structural properties, the magnetic response, and the superconducting properties in terms of critical temperature, critical current, and energy gaps. Particular attention was paid to the homogeneity of the films and to the comparison of their superconducting properties with those of single crystals of the same compound